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Abstract 

A new theoretical survey of proteins' resistance to constant speed stretching is performed for a set of 
17 134 proteins as described by a structure-based model. The proteins selected have no gaps in their 
structure determination and consist of not more than 250 amino acids. Our previous studies have dealt 
with 7510 proteins of no more than 150 amino acids. 

The proteins are ranked according to the strength of the resistance. Most of the predicted top-strength 
proteins have not yet been studied experimentally. Architectures and folds which arc likely to yield 
large forces are identified. New types of potent force clamps are discovered. They involve disulphidc 
bridges and, in particular, cysteine slipknots. An effective energy parameter of the model is estimated by 
comparing the theoretical data on characteristic forces to the corresponding experimental values combined 
with an extrapolation of the theoretical data to the experimental pulling speeds. 

These studies provide guidance for future experiments on single molecule manipulation and should lead 
to selection of proteins for applications. A new class of proteins, involving cystein slipknots, is identified 
as one that is expected to lead to the strongest force clamps known. This class is characterized through 
molecular dynamics simulations. 

Author Summary 

The advances in nanotechnology have allowed for manipulation of single biomolecules and determination 
of their elastic properties. Titin was among the first proteins studied in this way. Its unravelling by 
stretching requires a 204 pN force. The resistance to stretching comes mostly from a localized region 
known as a force clamp. In titin, the force clamp is simple as it is formed by two parallel j3 — strands 
that are sheared on pulling. Studies of a set of under a hundred of proteins accomplished in the last 
decade have revealed a variety of the force clamps that lead to forces ranging from under 20 pN to 
about 500 pN. This set comprises only a tiny fraction of proteins known. Thus one needs guidance as 
to what proteins should be considered for specific mechanical properties. Such a guidance is provided 
here through simulations within simplified coarse-grained models on 17 134 proteins that are stretched 
at constant speed. We correlate their unravelling forces with two structure classification schemes. We 
identify proteins with large resistance to unravelling and characterize their force clamps. Quite a few top 
strength proteins owe its sturdiness to a new type of the force clamp: the cystein slipknot in which the 
force peak is due to dragging of a piece of the backbone through a closed ring formed by two other pieces 
of the backbone and two connecting disulphidc bonds. 



Introduction 

Atomic force microscopy, optical tweezers, and other tools of nanotechnology have enabled induction 
and monitoring of large conformational changes in biomolecules. Such studies are performed to assess 
structure of the biomolecules, their elastic properties, and ability to act as nanomachincs in a cell. 
Stretching studies of proteins [1] are of a particular current interest and they have been performed for 
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under a hundred of systems. Interpretation of some of these experiments has been helped by all-atom 
simulations, such as reported in refs. [2.3]. They are limited by of order 100 ns time scales and thus 
require using unrcalistically large constant pulling speeds. However, they often elucidate the nature of 
the force clamp - the region responsible for the largest force of resistance to pulling, F max . All of the 
experimental and all-atom simulational studies address merely a tiny fraction of proteins that are stored 
in the Protein Data Bank (PDB) [4]. Thus it appears worthwhile to consider a large set of proteins 
and determine their F max within an approximate model that allows for fast and yet reasonably accurate 
calculations. Structure-based models of proteins, as pioneered by Go and his collaborators [5] and used 
in several implementations [6-13], seem to be suited to this task especially well since they are defined in 
terms of the native structures away from which stretching is imposed. 

There are many ways, all phcnomcnological, to construct a structure-based model of a protein. 504 
of possible variants are enumerated and 62 are studied in details in ref. [14]. The variants differ by 
the choice of effective potentials, nature of the local backbone stiffness, energy-related parameters, and 
of the coarse-grained degrees of freedom. The most crucial choice relates to making a decision about 
which interactions between amino acids count as native contacts. Comparing F max to the corresponding 
experimental values in 36 available cases selects several optimal models [14]. Among them, there is one 
which is very simple and which describes a protein in terms of its C a atoms, as labeled by the sequential 
index i. This model is denoted by LJ3 = {6 — 12, C, M3, E° } which stands for, respectively the 
Lennard- Jones native contact potentials, local backbone stiffness represented by harmonic terms that 
favor the native values of local chiralities, the contact map in which there are no i,i + 2 contacts, and 
the amplitude of the Lennard-Jones potential, e, is uniform. The contact map is determined by assigning 
the van der Waals spheres to the heavy atoms (enlarged by a factor to account for attraction) and by 
checking whether spheres belonging to different amino acids overlap in the native state [15,16]. If they 
do, a contact is declared as native. Non-native contacts are considered repulsive. Application of this 
criterion frequently selects the i,i + 2 contacts as native. If the contact map includes these contacts the 
resulting model will be denoted here as LJ2. On average, it performs worse than L J3 because the i, i + 2 
contacts usually correspond to the weak van der Waals couplings as can be demonstrated in a sample of 
proteins by using a software [17] which analyses atomic configurations from the chemical perspective on 
molecular bonds. Thus the i,i + 2 couplings should better be removed from the contact map (in most 
cases). 

The survey to determine F max in 7510 model proteins with the number of amino acids, N, not 
exceeding 150 and 239 longer proteins (with TV up to 851) has been accomplished twice. First within the 
LJ2 model [18] and soon afterwords within the LJ3 model [19]. The first survey also comes with many 
details of the methodology whereas the second just presents the outcomes. The two surveys are compared 
in more details in refs. [14,20]. The results differ, particularly when it comes to ranking of the proteins 
according to the value of F max , but they mutually provide the error bars on the findings. They both agree, 
however, on predicting that there are many proteins whose strength should be considerably larger than 
the frequently studied benchmark - the sarcomere protein titin (F max of order 204 pN [21,22]). Near the 
top of the list, there is the scaffoldin protein c7A (the PDB code laoh) which has been recently measured 
to have F max of about 480 pN [23]. Other findings include establishing correlations with the CATH 
hierarchical classification scheme [24,25], such as that there are no strong a proteins, and identification 
of several types of the force clamps. The large forces most commonly originate in parallel j3 — strands 
that are sheared [26]. However, there are also clamps with antiparallel /3— strands, unstructured strands, 
and other kinds. 

The two surveys have been based on the structure download made on July 26, 2005 when the PDB 
comprised 29 385 entries. Many of them correspond to nucleic acids, complexes with nucleic acids and 
with other proteins, carbohydrates, or come with incomplete files and hence the much smaller number of 
proteins that could be used in the molecular dynamics studies. Here, we present results of still another 
survey which is based on a download of December 18, 2008 which contains 54 807 structure files and 
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leads to 17 134 acceptable structures with N not exceeding 250 (instead of 150). These structures are 
then analyzed through simulations based on the L J3 model. The numerical code has been improved to 
allow for acceleration of calculations by a factor of 2. 

The 190 structures (or 1.1 % of all structure considered) with the top values of F max in units of e/A 
are shown in Table 1 (the first 81 entries for which F max > 3.9 e/A) and Table SI of the SI (proteins 
ranked 82 through 190), together with the values of titin (ltit) and ubiquitin (lubq) to provide a scale. 
As argued in the Materials and Methods section section, the unit of force, e/A, is now estimated to be of 
order 110 pN. All of the corresponding proteins are predicted to be much stronger than titin and none 
but two of them (laho, lglk [23]) have been studied experimentally yet. In addition to the types of force 
clamps identified before, we have discovered two new mechanisms of sturdiness. One of them involves 
a cysteine slipknot (CSK) and is found to be operational in all of the 13 top strength proteins. In this 
motif, a slip-loop is pulled out of a cysteine knot-loop. Another involves dragging of a single fragment of 
the main chain across a cysteine knot-loop. The two mechanisms are similar in spirit since both involve 
dragging of the backbone. However, in the CSK case, two fragments of the backbone are participating. 

We make a more systematic identification of the CATH-classified architectures that arc linked to 
mechanical strength and then analyze correlations of the data to the SCOP-based grouping (version 
1.73) [27-29]. The previous surveys did not relate to the SCOP scheme. 

We identify the CATH-based architectures and SCOP-based folds that are associated with the oc- 
currence of a strong resistance to pulling. A general observation, however, is that each such group of 
structures may also include examples of proteins that unravel easily. The dynamics of a protein are 
very sensitive to mechanical details that are largely captured by the contact map and not just by the 
appearance of a structure. On the other hand, if one were to look for mechanically strong proteins then 
the architectures and folds identified by us should provide a good starting point. We also study the 
dependence of F max on the pulling velocity and characterize the dependence on N through distributions 
of the forces. 

The current third survey has been performed within the same LJ3 model as the second survey [19]. 
However, we reuse and extend it here because the editors of Biophysical Journal retracted the second 
survey [30]. All of the values of F max are deposited at the website www.ifpan.cdu.pl/BSDB (Biomolecule 
Stretching Database) and can by accessed by through the PDB structure code. 

Results and Discussion 
Distribution of Forces 

The distribution of all values of F max for the full set of proteins is shown in Figure [TJ Despite the larger 
limit on TV now allowed, the distribution is rather similar to that obtained in ref. [19] for the smaller 
number of proteins (and with the smaller sizes). The similarity is primarily due to the fact that the 
size related effects, discussed below, are countered by new types of proteins that are now incorporated 
into the survey. The distribution is peaked around F max of 1.2 e/A which constitutes about 60% of the 
strength associated with titin. The distribution is non-Gaussian: it has a zero-force peak and a long 
force tail. The zero-force peak arises in some proteins with the covalent disulphide bonds. In the model, 
such bonds are represented by strong harmonic bonds. Stretching of such a protein may not result in 
any force peak before a disulphide bond gets stretched indefinitely and hence F max is considered to be 
vanishing then. The tail, on the other hand, corresponds to the strong proteins. The top strongest 1.1% 
of all proteins are listed in Tables 1 (in the main text) and Si (in the SI). 

The insets of Figure [1] show similar distributions for proteins belonging to the particular CATH-based 
classes. There are four such classes: a, /3, a — (3 and proteins with no apparent secondary structures. It 
is seen that none of the 3240 a proteins exceeds the peak force obtained for titin within our model. This 
observation is in agreement with experiments on several a proteins that are listed in the Materials and 
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Methods section. All strong proteins arc seen to involve the (3 — strands. The peak in the probability 
distribution for the a — (3 proteins is observed to be shifted towards the bigger values of F max compared 
to the one for the j3 proteins. At the same time, the high force tail of the distribution for the (3 proteins 
is substantially more populated than the corresponding tail for the a proteins. 

Figure [2] is similar to Figure [T] in spirit, but now the structures are split into particular ranges of 
the protein sizes: N between 40 and 100 (the dotted line), between 100 and 150 (thin solid line), and 
between 200 and 250 (the thick solid line). The curve for the range from 150 to 200 is in-between the 
curves corresponding to neighboring ranges and is not shown in order not to crowd the Figure. The 
distributions are seen to be shifting to the right when increasing the range of the values of N indicating, 
that the bigger the number of amino acids, the more likely a protein is to have a large value of F max . 
This observation holds for all classes of the proteins, as evidenced by the insets in Figure 

In most cases, the major force peak arises at the begining of stretching where the Go-like model should 
be applicable most adequately. One can characterize the location of F max during the stretching process 
by a dimcnsionless parameter A which is defined in terms of the end-to-end distance, as spelled out in 
the caption of Table 1. This parameter is equal to in the native state and to 1 in the fully extended 
state. In 25 % of the proteins studied in this survey, A was less than 0.25 and in 52 % - les than 0.5. 
There are very few proteins with A exceeding 0.8. 

Table 1 does not include any (non-cysteine-based) knotted proteins. The full list of 17 134 proteins 
contains 42 such proteins but they come with moderate values of F max . However, knotted proteins with 
N > 250 may turn out to have different properties. 



Biological properties of the strongest proteins 

A convenient way to learn about the biological properties listed in Tables 1 and SI is through the Gene 
Ontology data base [31] which links such properties with the PDB structure codes. The properties are 
divided into three domains. The first of these is "molecular function" which describes a molecular function 
of a gene product. The second is "biological processes" and it covers sets of molecular events that have 
well defined initial and final stages. The third is "cellular component" and it specifies a place where a 
given gene product is most likely to act. 

The results of our findings are summarised in Table 2. It can be seen, that most of the 190 strongest 
proteins arc likely to be found in an extracellular space where conditions are much more reducing than 
within cells. Larger mechanical stability is advantageous under such conditions. 90 out of the strongest 
proteins exhibit hydrolase activity. 39 of these 90 are serine-type endopeptidases. These findings seem to 
be consistent with expectations regarding proteins endowed with high mechanical stability. For instance, 
proteases, which are well represented in Table 2 should be more stable to prevent self-cleavage. 



CATH-based architectures 

The classification of proteins within the CATH (Class, Architecture, Topology, Homology) data base is 
done semi-automatically by applying numerical algorithms to structures that are resolved better than 
within 4 A [24,25]. The four classes of proteins in the CATH system are split into architectures, depending 
on the overall spatial arrangement of the secondary structures, the numbers of j3— sheets in various motifs, 
and the like. The next finer step in this hierarchical scheme is into topologies and it involves counting 
contacts between amino acids which are sequentially separated by more than a trcshold. The further 
divisions into homologous supcrfamilies and then sequence family levels involve studies of the sequential 
identity. 

We have found that only six architectures contribute to F max larger than 4 e/A. These are ribbons - 
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2.10 (41.8 % of the proteins listed in Table I), (3- barrels - 2.40 (8.9 %), (3- sandwiches - 2.60 (16.3 %), 
(3 — rolls - 3.10 (5.4 %), 3-layer (aba) sandwiches - 3.40 (5.4 %), and these with no CATH classification 
to date (21.8 %). The corresponding distributions of forces are shown in the top six panels of Figure [3] 
and the topologies involved are listed and named in Table 3. 

Examples of architectures that are dominant contributors to a low force behavior are the a orthogonal 
bundle (the right bottom panel of Figure [3]), the a up-down bundle, and the (3 — roll (the left bottom 
panel of Figure [3|) . 



SCOP-based classes and folds 

The SCOP (Structural Classification of Proteins) data base [27-29] is curated manually and it relies on 
making comparisons to other structures through a visual inspection. This classification scheme is also 
hierarchical and the broadest division is into seven classes and three quasi-classes. The classes are labelled 
a through g and these are as follows: mainly a (a), mainly [3 (6), a/f3 which groups proteins in which 
helices and (3 — sheets are interlaced (c), a + (3 with the helices and (3 — sheets grouped into clusters 
that are separated spatially (d), multidomain proteins (e), membrane and cell-surface proteins (/), and 
small proteins that are dominated by disulphide bridges or the heme metal ligands (g). The quasi-classes 
arc labelled h through j and they comprise coilcd-coil proteins (/i), structures with low resolution (i), 
and peptides and short fragments (J). The classes are then partitioned into folds that share spatial 
arrangement of secondary structures and the nature of their topological interlinking. Folds are then 
divided into superfamilics (same fold but small sequence identity) and then families (two proteins are 
said to belong to the same family if their sequence identity is at least 30%). Families are then divided 
into proteins - a category that groups similar structures that are linked to a similar function. Proteins 
comprise various protein species. 

Each structure assignment comes with an alphanumeric label, as shown in Tables 1, SI, and 4 which 
reflects the placement in the hierarchy. At the time of our download, there have been 92 972 entries in 
the SCOP data base that are assigned to 34 495 PDB structures. These entries are divided into 3464 
families, 1777 superfamilics and 1086 unique folds. A given structure may have several entry labels but 
the dominant assignment is listed first. We use the primary assignment in our studies. The same rule is 
also applied to the CATH-based codes. 

Figure |4] shows the distributions of forces for the SCOP-based classes of proteins. The results are 
consistent with the CATH-based classes since the a — class of CATH basically encompasses the a/ (3 
and a + f3 classes of SCOP. However, there are proteins which are classified only according to one of the 
two schemes. Thus there are 4431 a — j3 proteins out of which only the total of 3368 is SCOP-classified 
as belonging to the a + (3 and a/ (3 classes. At the same time, the total of the proteins in the a + (3 and 
a/ (3 classes we have is 4795. 

It should be noted that the peak in the distribution for a + (3 is shifted to higher forces by about 
0.7 e/A from the peak for a/ (3. At the same time, the zero-force peak is virtually absent in a + (3. The 
SCOP-based classification also reveals that its class g contributes across the full range of forces and, in 
particular, it may lead to large values of F max . It should be noted, as also evidenced by Table 1, that 
there is a substantial number of strong proteins that has no class assignment. 

Figures [5] and [6] refer to the distributions of F max across specific folds. The first of these presents 
results for the folds that give rise to the largest forces. The names of such folds are specified in Figure 
[5j The percentage-wise assessment of the folds contributing to big forces is presented in Table 4. The 
top contributor is found to be the 6.47 fold (SMAD/FHA domain). Figure [H gives examples of folds that 
typically yield low forces. 

It is interesting to note that distributions corresponding to some folds are distinctively bimodal, as 
in the case of the SMAD/FHA fold (b.47). This particular fold is dominated by SMAD3 MH2 domain 
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(b. 47. 1.2; 352 structures) which contributes both to the high and low force peaks in the distribution. The 
remaining domains (b. 47. 1.1, b47.1.3, and b47.1.4) contribute only to the low force peak. The dynamical 
bimodality of the b. 47. 1.2 fold can be ascribed to the fact that the strong subset comes with one extra 
disulphidc bond relative to the weak subset. This extra bond provides substantial additional mechanical 
stability when stretching is accomplished by the termini. We illustrate sources of this bimodality in the 
SI (Figure SI) for two proteins from this fold: lbra which is strong and lelc which is weak. In ref. [18], 
we have noted that various sets of proteins with identical CATH codes (e.g., 3.10.10) may give rise to 
bimodal distributions without any dynamical involvement of the disulphidc bonds. The reason for this 
is that even though the contact maps for the two modes are similar, the weaker subset misses certain 
longer ranged contacts which pin the structure. Mechanical stability is more sensitive to structural and 
dynamical details than are not provided by standard structural descriptors. 



Force clamps 

Shearing motif. The most common type of the force clamp identified in the literature is illustrated in the 
top left panel of Figure [7] corresponding to the 14th-ranked protein lc4p. In this case, the strong resistance 
to pulling is due to a simultaneous shearing of two (3 — strands which are additionally immobilised by 
short (3 — strands that adhere to the two strands. Similar motifs appears in lqqr(15), lj 8s(17), lj8r(19), 
lf3y(20), 2pbt(29), 2fzl(15), laoh(19), where the number in brackets indicate ranking as shown in Table 
1. It is interesting to note that the f3 — strands responsible for the mechanical clamp in lj8s and lj8r 
display an additional twist. Undoing the twist enhances F max . (There is a similar mechanism that seems 
to be operational in the case of a horseshoe conformation found in ankyrin [32,33]). The force clamps arc 
identified by investigating the effect of removal of various groups of contacts on the value of F max [12, 18] . 

There are, however, new types of the force clamps that we observe in the proteins listed in Tables 
1 and SI. They arise from entanglements resulting from the presence of the disulphide bonds which 
cannot be ruptured by forces accessible in the atomic force microscopy. We note that about 2/3 of the 
proteins listed in Tables 1 and SI contain the disulphide bonds. Many of these bonds do not carry much 
of dynamical relevance when pulling by the termini. However, in certain situations they are the essence of 
the force clamp. The disulphidc bonds have been already identified as leading to formation of the cystein 
knot (CK) motifs [34. 35] (such proteins are found in the toxins of spiders and scorpions) and the cyclic 
CK motifs [36,37]. Here, we find still another motif - that of the CSK which is similar to that found in 
slipknottcd proteins [38-40] which do not conatin the disulphidc bonds. This motif is found in the top 13 
proteins. The cysteine loop, knot, and slipknot motifs are shown schematically in the remaining panels of 
Figure [7] It is convenient to divide these motifs into two categories: shallow (S) and deep (D) (according 
to the classification used for knotted proteins [41,42]), depending on whether the motif is spanning most 
of the sequence or is instead localized in its small fraction. 

Shearing connected with a cysteine loop. In this case, the mechanical clamp arises from shearing 
between a (3— strand belonging to a deep cysteine loop and another strand located outside the loop (the left 
bottom panel of Figurc[7]). Existence of the disulphide bond before the shearing motif allows to decompose 
direct tension onto the (3 — strands making the protein resist stretching much more effectively than what 
would be expected from a simple shearing motif. Additionally, the disulphidc bonds prevent an onset 
of any rotation in the protein conformation which otherwise might form an opportunity for unzipping. 
This motif appears in ldzj(40,D) lvsc(37,D), ldzk(35,D), li04 (81,D), lhqp(83,D), loxm(98,D), 2a2g 
(175, D), 2boc(179,D), and many other proteins. The middle panel of Figure [5] gives an example of the 
corresponding force (F) - displacement (d) pattern as obtained for ldzj. 

Shearing and dragging out of a cysteine loop. This motif consists of two parts. The first 
is formed by a rather small and deep cysteine loop which is located very close to one terminus with 
the second terminus located across the cysteine loop. The motif arises when almost all of the protein 
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backbone is dragged across the cysteine loop on stretching. A protein structure also contains a few 
j3— strands which get sheared before dragging takes place. This motif is seen in lkdm(23,D), lq56(24,D), 
lquO(33,D), lf5f(34,D) and this geometry of pulling we call geometry I. It should be pointed out that, 
in all such cases, pulling by the N terminus takes place within (or very near) the plane formed by the 
cysteine loop. A small change in such a geometry, e.g. the one arising from pulling not by the last 
amino acid but by the penultimate bead, may cause getting out of the cystein loop and result in a very 
different unfolding pathway with a distinctly different value of F max . In this other kind of pulling set 
up, denoted as geometry II, the loop is bypassed and the resistance to pulling is provided only by the 
shearing mechanism. 

Dragging arises from overcoming steric constraints and generates an additional contribution to the 
strength of the standard shearing mechanical clamp. By using geometry II and also by eliminating the 
native contacts between the sheared [3 — strands we can estimate the topological contribution of the 
dragging effect on the value of F max . For proteins lkdm, lq56, lquO, lf5f, it comes out to be around 25 
%. The force F—d patterns corresponding to these two geometries of pulling arc shown in top panel of 
Figure M 

In the survey, there are other proteins which also have disulphidc bonds and belong to the 2.60.120.200 
category. These proteins have a cysteine which is either very shallow or deep, but is located in the middle 
of the protein backbone so that there is no possibility to form a long f3 — strand. In this case, the dragging 
effects are much smaller. For instance, for lpz7(D) and lcpm(S), F max is close to 1 e/A. 

Shearing inside of a cysteine knot . This motif is created by a loosely packed CK (two or more 
spliced cysteine loops) with at least two parallel j3 strands that are present within the knot. Pulling 
protein by termini exerts tension on the entire CK and thus produces an indirect shearing force on the 
j3 — strands inside the entangled part of the protein. In this case, elimination of the native contacts 
between the f3 — strands reduces F max only partially indicating that the mechanical clamp is created 
also by the CK. A simple CK is also found in 2bzm(42) and many other proteins, e.g. in 2g7i(77,S), 
lhfhl03,S), 2g4x(136,D), 2g4w(169,D). The F — d patterns for 2bzm and 2g4x arc shown in the bottom 
panel of Figure [8J More complex structures or higher order CKs (with more than two cystein bonds) can 
be identified in lafk(85), lafl(117), or laqp(135). Inside this group of proteins there are also examples of 
proteins - lqoz(88,S) - in which a cysteine loop is braided to a CK by some native contacts. 

Cysteine slipknot force-clamp is observed in the strongest 13 proteins. The top strength protein 
is lbmp (bone morphogenic protein) with the predicted F max of 10.2 e/A, which should correspond to 
about 1100 pN (see Materials nad Methods). This strength should be accessible to standard experiments 
as the atomic force microscopy has been already used to rupture covalent N-C and C-C bonds by forces 
of 1500 and 4500 pN respectively [43]. 

In our discussion, we focus on the 13-ranked lvpf (a vascular endothelial growth factor) with the 
predicted F max of 5.3 e/A. The CSK motif arises from two loops [40]: the knot-loop and the slip-loop, 
where the slip-loop can be threaded across the knot-loop. One needs at least three disulphidc bonds for 
this motif to arise. 

In the case of the lvpf, the knot-loop is created by the disulphidc bonds between amino acids 57 
and 102, 61 and 104, and the protein backbone between amino acids 57-61 (GLY,GLY,CYS) and 102- 
104 (GLU). The slip-loop is created by the protein backbone between sites 61-102 and is stabilized by 12 
hydrogen bonds between two parallel (3— strands. In the CSK motif, the force peak is due to dragging of a 
slip- loop through the knot-loop making the native hydrogen contacts only marginally responsible for the 
mechanical resistance. Thus the force peak arises, to a large extent, from overcoming steric constraints, 
i.e. it is due to repulsion resulting from the excluded volume. The F — d pattern for this novel type of 
a force clamp is shown in the top panel of Figure [51 Another example of such a pattern for a CSK is 
shown in the bottom panel of Figure [5] for the 22nd ranked 2h64 (a human transforming growth factor). 
The leading role of the steric constraints is verified by checking the reduction of the F max when all the 
slipknot-related contacts (inside the slip-loop and between the slip-loop and the knot-loop) are converted 
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to be purely repulsive. As a result of this bond removal, the force peak persists, though it gets shifted 
and becomes smaller. This is summarized in Tabic S2 in the SI. It is a new and unexpected result. 

Another way to establish the role of the CSK motif is to create the disulphide-dcficient mutants, 
as accomplished experimentally [44] for lvpf. The two mutants, lmkk (C61A and C104A) and lmkg 
(C57A and C102A), have structures similar to lvpf but contain no knot-loops and thus there is no 
slipknot. Muller et al. [44] show that the mutants' thermodynamic stability is not reduced but their 
folding capacity is. Our work shows that the mutants have a reduced resistance to pulling compared to 
lvpf: F max drops from 5.3 e/A to 1.49 and 2.01 e/A for lmkk and lmkg respectively. 

We note that the CSK topology is a subgroup inside the CK class (represented mostly by 2.10.90.10) 
and the CSK force clamp need arise for a particular way of pulling. For instance, proteins lafk(68), 
lafl(lOO) or laqp(118) have up to four disulphidc bonds and yet the CSK motif docs not play any 
dynamical role in pulling by the terminal amino acids. In the case of the CSK, we observe a formidable 
dispersion in the values of F max . For example, it ranges between 4.8-5.9, 4.1-4.8, and 4.1-5.2 e/A for 
various trajectories in lvpf, 2h64, and 2c7w respectively. We now examine the CSK geometry in more 
details. 

Cysteine slipknot motif is distinct from the slipknot motif in several ways. The left-most panel of 
Figure flOl shows a slipknot with three intersections at sequential locations fci, k 2 , and k^. This geometry 
is topologically trivial since when one pulls by the termini, the apparent entanglement may untie and 
become a simple line. The entanglement would form the trefoil knot if the fc 3 intersection was removed by 
redirecting the corresponding segment of the chain (thin line) away from the k\ — k^, loop. Such slipknot 
motifs have been observed in native states of several proteins [38-40]. In contrast, the CSKs are not 
present in the native state but arise as a result of mechanical manipulation. The middle panel of Figure 
1101 shows a schematic representation of a native conformation with three cysteine bonds: between iy and 
ji, between i 2 and j 2 , and between i 3 and j'3. The i-ends of the bonds are counted as being closer to 
the N-terminus. The three bonds are in a specific arrangement as shown in the panel. In particular, the 
*3 — 33 bond must cross the loop ix — i<x — j'2 — ji . This loop consists of two pieces of the backbone (ii — i 2 
and J2 — ji ) that are linked to form a closed path by the two remaining cysteine bonds - it is the cysteine 
knot-loop. The average radius of this loop is denoted by R c k- 

The arrangement shown in the middle panel has no entanglements that could be considered as knots in 
the topolgical sense. However, on pulling by the termini, the chain segment adjacent to ^3 gets threaded 
through the knot-loop since 13 is rigidly attached to .73, as illustrated in the rightmost panel of Figure 
[T0l Pulling by 13 — j'3 also results in generating another loop - the cysteine slip-loop - since the segment 
around i 3 gets bent strongly to form a cigar like shape with the radius of curvature at the ^3-tip denoted 
by R cs . This loop extends between i 2 and j\. It should be pointed out that the cysteine knot- loop in the 
CSK is stiff whereas in a slipknotted protein (such as the thymine kinase) its size is variable (as it can 
be tightened on the protein backbone [40] in analogy to tightening a knot [45] by pulling). 

The dynamics of pulling depends of the relationship between R c k and R cs as the " cigar" may either go 
through or get stuck. In the former case a related force peak would arise. If the system was a homogeneous 
polymer, dragging would be successful when R c k was bigger than R cs . The corresponding force would 
be related to the work against the elasticity that was needed to bend the slip-loop to the appropriate 
curvature. This work is proportional to the square of the curvature. Thus the total elastic energy involved 
in bending the segment i 2 — ji is of order § dsR~ 2 ~ i?" 1 [46], where s is the arc distance. Dividing this 
energy by the distance of pulling would yield an estimate of the force measured if thermal fluctuations 
were neglected. The geometrical condition for dragging in proteins is more complicated because of the 
presence of the side groups and the related non-homogeneities and variability across the hydrophobicity 
scale. The diameter of the " rope" that the knot loop is made of should not exceed the maximum a linear 
extension, tk of amino acids. Thus the effective inner radius of the knot-loop is R c k — tk- Similarly, the 
size of the outer circle that is tangential to the tightest slip-loop is R cs +t s , where t s is the thickness of the 
slip- loop. (Both thicknesses can be considered as being site dependent and including possible hydration 
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layer effects near polar amino acids.) Thus the slip-knot can be driven through the cystein knot-loop 
provided 

R cs + t s < R ck -t k . (1) 

In our simulations, the successful threading situations correspond to R c k and R cs of around 7 and 3 A. 
The amino acids in the knot-loop are mostly Gly, Ala, or Cys with their side groups pointing outside of 
the loop. One may then estimate t k to be about 1.5 A. On the other hand, the linear size of the amino 
acids in the slip-loop can be determined to be close to 2.5 A. These estimates indicate that R cs + t s 
can be very close to R ck — t k so the possibility of slipping through the knot-loop is borderline. In fact, 
slipping might be forbidden within the framework of the tube-picture of proteins [47, 48] in which the 
effective thickness of the tube is considered to be 2.7 A. 

The CSK motifs give rise to a force peak in lvpf, 2h64(22,S), lrv6(25,S), lwaq(26,S), lreu(27,S), 
ltgj(28), 2h62(30,S), ltgk(31), 2c7w(38,D), 2gyr(39,S), llx5(95,D), and many other proteins. In these 
cases, the typical value of R ck is about 7 A. However, specificity may result in somewhat smaller values 
of R c k which may cause only smaller segments of the slip-loop to be threaded. If the passage is blocked, 
there will be no isolated force peak as happens in ltgj and lvpp. 

Types of the force displacement patterns for proteins with the disulphide bonds. In the 
case of proteins with very shallow cystein knot, loop or slipknot motifs, F increases very rapidly with d 
and isolated force peak does not arise (F max = 0). Such cases are represented, e.g., by lbmp, lrnr, Hd5, 
and lwzn where the slipknots are either very tight or the cystein loop is very shallow. In the case of a 
shallow motif, however, a force peak can sometimes be isolated as in the case of the 13th-ranked protein 
lvpf (Figure [8]) and in several other proteins, like lxzg and ldzk. In this case, the value of F max takes 
into account tension on the cystein bonds and it is not obvious whether such a strong elastic background 
should be subtracted from the value of F when determining F max or not. In this survey, we do not 
subtract the backgrounds. It should be noted that in our previous surveys we missed the CSK-related 
force peaks because we attributed the rapid force rises at the end of pulling just to stretching of the 
backbone without realizing existence of structure in some such rises. 

For a deep motif, the F — d pattern may have several small force peaks before the final rise of the 
force, as observed for 2g4s and lbj7. When the CSK motif is very deep, it usually does not have any 
influence on the shape of the F — d pattern apart from a much steeper final rising force. Such a situation 
is seen in the case of, e.g., Ij8r and lj8s. 



Concluding remarks 

This surveys identifies a host of proteins that are likely to be sturdy mechanically. Many of them involve 
disulphide bridges which bring about entanglements that arc complicated topologically such as CSKs and 
CKs. The distinction between the two is that the former can depart from its native conformation and 
the latter cannot. 

Our survey made use of a coarse grained model so it would be interesting to reinvestigate some of 
the proteins identified here by all-atom simulations, especially in situations when the CSK is involved. 
The CSK motifs may reveal different mechanical properties when studied in a more realistic model. Of 
course, a decisive judgment should be provided by experiment. 

The very high mechanical resistance of the CSK proteins should help one to understand their biolog- 
ical function. The superfamily of cysteine-knot cytokines (in class small proteins and fold cystein-knot 
cytokines) includes families of the transfroming growth factor (TGF)-/3 and the polypeptide vascular 
endothelial growth factors (VEGFs) [49,50]. The various members of this superfamily, listed in Table 
5, have distinct biological functions. For instance, VEGF-B proteins which regulate the blood vessel 
and limphatic angiogencsis bind only to one receptor of tyrosine kinase VEGFR-1. On the other hand, 
VEGF-A proteins bind to two receptors VEGFR-1 and VEGFR-2. All of these proteins form a dimcr 
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structure. The members of this familly are endowed with remarkably similar monomer structures but 
differ in their mode of dimerisation and thus in their propensity to bind ligands. Additionally, all dimcrs 
posses almost the same a cyclic arrangement of cysteine residues which are involved in both intra- and 
inter-chain disulphidc bonds. These inter-chain disulphide bonds create the knot and slip-loops, where 
the intra-chain disulphidc bonds give rise to a CSK motif when the slip-loop is gets dragged acrros the 
knot-loop upon pulling. 

It has been shown experimentally [51] that such cysteine related connectivities bring the key residues 
involved in receptor recognition into close proximity of each other. They also provide a primary source 
of stability of the monomers due to the lack of other hydrogen bonds between two beta strands at the 
dimcr interface. 

The non trvial topologial connection between the monomers allow for mechanical separation of two 
monomers by a distance of about half of the size of the slip-loop. Our results suggest, however, that the 
force needed for the separation may be too high to arise in the cell. 

Materials and Methods 

The input to the dynamical modeling is provided by a PDB-bascd structures. The structure files may 
often contain several chains. In this case, we consider only the first chain that is present in the PDB file. 
Likewise, the first NMR determined structure is considered. If a protein consists of several domains, we 
consider only the first of them. 

The modeling cannot be accomplished if a structure has regions or strings of residues which are not 
sufficiently resolved experimentally. Essentially all structure-disjoint proteins have been excluded for our 
studies. Exceptions were made for the experimentally studied scaffoldin laoh and for proteins in which 
small defects in the established structure (such as missing side groups) were confined within cystein loops 
and were thus irrelevant dynamically. In these situations, the missing contacts have been added by a 
distance based criterion [23] in which the treshold was set at 7.5 A. Among the test used to weed out 
inadequate structures involved determining distances between the consecutive C" atoms. A structure 
was rejected if these distances were found to be outside of the range of 3.6-3.95 A. The exception was 
made for prolines, which in its native state can accommodate the cis conformation. In that case, the 
distance between a proline C Q and its subsequent amino acid usually falls in the range between 2.8 and 
3.85 A. For a small group of proteins which slipped through our structure quality checking procedure, 
but were found to be easily fixed (e.g. If5f, lfy8, and 2f3c), we used publicly avialablc software BBQ [52] 
to rebuild locations of the missing residues. A limited accuracy of this prediction procedure seems to be 
adequate for our model due to its the coarse-grained nature. 

The modeling of dynamics follows our previous implementations [11, 12, 18] within model LJ2 except 
that the contact map is as in ref. [19], i.e. with the i, i + 2 contacts excluded. There is also a difference 
in description of the disulphidc bonds. In refs. [14, 19] they were treated as an order-of-magnitude 
enhancement of the Lennard-Jones contacts in all proteins. In ref. [18] the different treatment of the 
disulphidc bonds was applied to the proteins that were found to be strong mechanically without any 
enhancements. Here, on the other hand, we consider such bonds as harmonic in all proteins, in analogy 
to the backbone links between the consecutive C a s. The native contacts are described by the Lennard- 



Jones potential V 6 12 = 4 e[( ^ ] — \T 2 ') h where m is the distance between the C a 's in amino acids 



i and j whereas cr.y is determined pair-by-pair so that the minimum in the potential is located at the 
experimentally established native distance. The non-native contacts are repulsive below of 4 A. 

The implicit solvent is described by the Langcvin noise and damping terms. The amplitude of the 
noise is controlled by the temperature, T. All simulations were done at ksT = 0.3 e, where ks is the 
Boltzmann constant. Newton's equations of motion are solved by the fifth order predictor-corrector 
algorithm. The model is considered in the overdamped limit so that the characteristic time scale, r, is of 
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order 1 ns as argued in refs. [6,53]. Stretching is implemented by attaching an elastic spring to two amino 

acids. The spring constant used has a value of 0.12 e/A 2 which is close to the elasticity of experimental 
cantilevers. One of the springs is anchored and the other spring is moving with a constant speed, v p . 
Choices in the value of the spring constant have been found to affect the look of the force-displacements 
patterns and thus the location of the transition state [54, 55], but not the values of F max [10, 12, 18]. 

The dependence on v p is protein-dependent and it is approximately logarithmic in v p as evidenced by 
FigurcfTTlfor several strong proteins. The logarithmic dependence has been demonstrated experimentally, 
for instance, for polyubiquitin [56,57]. F max = p ln(v/vo) + q. The approximate validity of this 
relationship is demonstrated in FigurcfTTlfor three proteins with big values of F max . We observe that the 
larger the value of F max , the bigger probability that the dependence on v p is large. When we make a fit 
to F max = p ln(v/vo) + q for lvpf, lc4p, and lj8s, we get the parameter p to be equal to 0.39 ± 0.11, 
0.17±0.03, and 0.04±0.02e/A respectively (the values of q arc 7.42±0.63, 5.85±0.16, and 4.96±0.08e/A 
correspondingly). However, some strong proteins may have p to be as low as 0.04. 

When making the survey, we have used v p of 0.005 A/r and stretching was accomplished by attaching 
the springs to the terminal amino acids (there is an astronomical number of other choices of the attachment 
points). 

In order to estimate an effective experimental value of the energy parameter e, we have correlated 
the theoretical values of F rnax with those obtained experimentally. The experimental data points used in 
ref. [14] have been augmented by entries pertaining to lemb (117-182), lemb (182-212) [58] (where the 
numbers in brackets indicate the amino acids that are pulled) and laoh, lglk, and lamu [23]. The full 
list of the experimental entries is provided by Table 6. Unlike the previous plots [14] that cross correlate 
the experimental and theoretical values of F max , we now extrapolate the theoretical forces to the values 
that should be measured at the pulling speeds that are used experimentally. We assume that the unit of 
speed, vq = 1 A/r, is of order 1 A/ns and consider 10 speeds to make a fit to the logarithmic relationship. 
The values of parameters p and q for the proteins studied experimentally arc listed in Table 6. 

The main panel of Figure 1121 demonstrates the relationship between the extrapolated theoretical and 
experimental values of F rnax . The best slope, indicated by the solid line, corresponds to the slope of 
0.0091. The inverse of this slope yields 110 pN as an effective equivalent of the theoretical force unit of 
e/A. The Pear son correlation coefficient, R 2 is 0.832, the rms percent error, r e , is 1.02, and the Theil U 
coefficient (discussed in ref. [14]) is 0.281. The inset show a similar plot obtained when the extrapolation 
to the experimental speeds is not done. The resulting unit of the force would be equivalent to 110 pN 
which differs form the previous estimate of 71 pN (shown by the dotted line in the main panel) because 
of the inclusion of the newly measured proteins and implementation of the extrapolation procedure. 
The statistical measures of error here are R 2 = 0.851, r e = 0.37, and U — 0.251. These measures are 
better compared to the case with the extrapolation because the extrapolation procedure itself brings in 
additional uncertainties. Nevertheless, implementing the procedure seems sounder physically. The spread 
between these various effective units of the force suggests an error bar of order 30 pN on the currently 
best value of 110 pN. 
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Tables 



TABLE 1: The predicted list of the strongest proteins. 



n 


PDBid 


N 


Fmax [e/A] 


Lmax 


A 


CATH 


SCOP 


1 


lbmp 


104 


10.2 


23.2 


0.01 


2.10.90.10 


g.17.1.2 


2 


iqty 


95 


8.9 


72.1 


0.11 


2.10.90.10 


b. 1.1.4 


3 


2bhk 


119 


7.3 


26.5 


0.67 






4 


llxi 


104 


7.3 


22.5 


0.01 




g.17.1.2 


5 


lcz8 


107 


6.4 


76.5 


0.13 


2.10.90.10 


b. 1.1.1 


6 


2gh0 


219 


5.8 


25.9 


0.06 






7 


lwq9 


100 


5.5 


72.0 


0.10 


2.10.90.10 


g.17.1.1 


8 


lflt 


107 


5.5 


75.6 


0.12 


2.10.90.10 


b. 1.1.4 


9 


lfzv 


117 


5.4 


90.4 


0.12 


2.10.90.10 


g.17.1.1 


10 


2gyz 


100 


5.4 


14.4 


0.01 






11 


lrew 


103 


5.3 


21.7 


0.01 


2.10.90.10 


g.7.1.3 


12 


lm4u 


139 


5.3 


52.1 


0.07 


2.10.90.10 


g.17.1.2 


13 


lvpf 


94 


5.3 


68.1 


0.11 


2.10.90.10 


g.17.1.1 


14 


lc4p 


137 


5.1 


106.0 


0.12 


3.10.20.180 


d. 15. 5.1 


15 


lqqr 


138 


5.0 


110.3 


0.12 


3.10.20.180 


d. 15. 5.1 


16 


3bmp 


114 


5.0 


33.0 


0.03 


2.10.90.10 


g.17.1.2 


17 


lj8s 


193 


4.9 


77.9 


0.03 


2.60.40.1370 


b.2.3.3 


18 


lwq8 


96 


4.9 


82.6 


0.11 


2.10.90.10 


g.17.1.1 


19 


lj8r 


193 


4.8 


77.7 


0.03 


2.60.40.1370 


b.2.3.3 


20 


lf3y 


165 


4.8 


284.7 


0.43 


3.90.79.10 


d. 113. 1.1 


21 


2vpf 


109 


4.7 


79.3 


0.11 


2.10.90.10 


g.17.1.1 


22 


2h64 


105 


4.6 


29.4 


0.03 




g.7.1.3 


23 


lkdm 


177 


4.6 


309.4 


0.45 


2.60.120.200 


b.29.1.4 


24 


lq56 


195 


4.5 


473.2 


0.62 


2.60.120.200 


b. 29. 1.4 


25 


lrv6 


94 


4.5 


67.7 


0.11 


2.10.90.10 


b. 1.1.4 


26 


lwaq 


104 


4.5 


20.1 


0.01 






27 


lreu 


103 


4.5 


20.4 


0.01 


2.10.90.10 


g.17.1.2 


28 


Itgj 


112 


4.4 


45.9 


0.07 


2.10.90.10 


g.17.1.2 


29 


2pbt 


133 


4.4 


219.9 


0.39 






30 


2h62 


104 


4.4 


24.3 


0.02 




g.7.1.3 


31 


ltgk 


112 


4.4 


44.6 


0.07 


2.10.90.10 


g.17.1.2 


32 


2fzl 


197 


4.4 


49.7 


0.02 




c.37.1.19 


33 


lquO 


181 


4.3 


156.9 


0.22 


2.60.120.200 


b.29.1.4 


34 


lf5f 


172 


4.3 


186.2 


0.28 


2.60.120.200 


b.29.1.4 


35 


ldzk 


148 


4.3 


110.3 


0.16 


2.40.128.20 


b.60.1.1 


36 


laoh 


147 


4.3 


77.1 


0.01 


2.60.40.680 


b.2.2.2 


37 


lvsc 


196 


4.3 


238.3 


0.24 


2.60.40.10 


b. 1.1.3 


38 


2c7w 


96 


4.2 


184.2 


0.45 


2.10.90.10 




39 


2gyr 


97 


4.2 


27.1 


0.05 


2.10.90.10 




40 


ldzj 


148 


4.2 


111.0 


0.16 


2.40.128.20 


b.60.1.1 
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41 


2sak 


121 


4.2 


76.0 


0.10 


3.10.20.130 


d. 15. 5.1 


42 


2bzm 


129 


4.2 


124.3 


0.24 






43 


2pql 


134 


4.1 


222.6 


0.39 






44 


lnwv 


129 


4.1 


129.8 


0.13 


2.10.70.10 


g.18.1.1 


45 


le5g 


120 


4.1 


133.1 


0.17 


2.10.70.10 


g.18.1.1 


46 


2ick 


220 


4.1 


462.5 


0.54 






47 


lgvl 


223 


4.1 


114.9 


0.09 


2.40.10.10 


b.47.1.2 


48 


ltgs 


225 


4.1 


122.3 


0.10 


2.40.10.10 


b.47.1.2 


49 


lu20 


196 


4.0 


408.5 


0.53 




d. 113. 1.1 


50 


lcui 


197 


4.0 


422.8 


0.55 


3.40.50.1820 


c.69.1.30 


51 


lffd 


197 


4.0 


423.0 


0.55 


3.40.50.1820 


c.69.1.30 


52 


lkdk 


177 


4.0 


357.2 


0.53 


2.60.120.200 


b. 29. 1.4 


53 


2icj 


219 


4.0 


455.9 


0.53 






54 


3dd5 


194 


4.0 


403.3 


0.53 






55 


lcug 


197 


4.0 


422.6 


0.55 


3.40.50.1820 


c.69.1.30 


56 


lbOo 


161 


4.0 


237.3 


0.36 


2.40.128.20 


b.60.1.1 


57 


lxza 


197 


4.0 


422.9 


0.55 


3.40.50.1820 


c.69.1.30 


58 


lvcd 


126 


4.0 


199.7 


0.37 




d. 113. 1.1 


59 


lcuw 


197 


4.0 


422.9 


0.55 


3.40.50.1820 


c.69.1.30 


60 


lxzi 


197 


4.0 


422.9 


0.55 


3.40.50.1820 


c.69.1.30 


61 


lcus 


197 


4.0 


423.3 


0.55 


3.40.50.1820 


c.69.1.30 


62 


lcuf 


197 


4.0 


423.1 


0.55 


3.40.50.1820 


c.69.1.30 


63 


2a7h 


223 


4.0 


114.7 


0.10 


2.40.10.10 


b.47.1.2 


64 


lcq3 


224 


4.0 


128.0 


0.12 


2.60.240.10 


b. 27.1.1 


65 


lffc 


197 


3.9 


421.6 


0.55 


3.40.50.1820 


c.69.1.30 


66 


lvc9 


126 


3.9 


199.1 


0.37 




d. 113. 1.1 


67 


lcua 


197 


3.9 


423.0 


0.55 


3.40.50.1820 


c.69.1.30 


68 


lxzl 


197 


3.9 


423.1 


0.55 


3.40.50.1820 


c.69.1.30 


69 


2faw 


250 


3.9 


250.8 


0.25 






70 


2vn5 


142 


3.9 


49.2 


0.02 






71 


lcux 


197 


3.9 


421.5 


0.55 


3.40.50.1820 


c.69.1.30 


72 


lcuh 


197 


3.9 


421.6 


0.55 


3.40.50.1820 


c.69.1.30 


73 


2dsd 


195 


3.9 


429.7 


0.56 






TA 
(4 


ztSc 


221 


3.9 


1 1 o c 
113.0 


0.1U 


o An in in 
2.4U.10.1U 


b.4( .1.2 


75 


lxzj 


197 


3.9 


421.8 


0.55 


3.40.50.1820 


c.69.1.30 


76 


lxzf 


197 


3.9 


421.0 


0.55 


3.40.50.1820 


c.69.1.30 


77 


2g7i 


124 


3.9 


106.6 


0.10 






78 


lglk 


143 


3.9 


52.0 


0.02 


2.60.40.680 


b.2.2.2 


79 


lcuc 


197 


3.9 


421.3 


0.55 


3.40.50.1820 


c.69.1.30 


80 


lxzk 


197 


3.9 


422.5 


0.55 


3.40.50.1820 


c.69.1.30 


81 


li04 


159 


3.9 


231.7 


0.34 


2.40.128.20 


b.60.1.1 


3144 


lubq 


76 


2.2 


47.9 


0.04 


3.10.20.90 


d. 15. 1.1 


3580 


ltit 


89 


2.1 


55.3 


0.04 


2.60.40.10 


b. 1.1.4 



Table 1. F max is obtained within the LJ3 model at the pulling velocity of 0.005 A/r. The first 
column indicates the ranking of a model protein, the second - the PDB code, and the third - the number 
of the amino acids that are present in the structure used. L max denotes the end-to-end distance at 
which the maximum force arises. A is the corresponding dimensionless location defined as A = (L max — 
L n )/(Lf — L n ), where L n is the native end-to-end distance and Lf corresponds to full extension. The 
last two columns give the leading CATH and SCOP codes. The survey is performed based strictly on 
the PDB-assigned structure codes. It may happen that the structure of a protein has been determined 
several times and then each of these determinations leads to its own value of F max . In this case, one may 
derive the best estimate either by picking the best resolved structure or by making (weighted) averages 
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over all related structures. 
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TABLE 2. Gene Ontology terms for the top 190 proteins. 



Domain 


GO identifier 


Term name 


No. of structures 


Example 


Molecular function 


GO:0016787 


hydrolase activity 


90 


lf3y 




GO:0003824 


catalytic activity 


70 


lgvl 




GO:0004252 


serine-type endopeptidase activity 


39 


lc4p 




GO:0008083 


growth factor activity 


25 


lbmp 


Biological process 


GO:0006508 


proteolytic activity 


34 


2a7h 




GO:0007586 


digestion 


32 


lbra 


Cellular component 


GO:0005576 


extracellular region 


122 


lvpf, laoh 




GO:0005515 


protein binding 


70 


lbmp 



TABLE 3: CATH classes (C), architectures (A), and topologies (T) contributing to the top strength 
proteins. The percentages indicated in the column denode by "Strong" are relative the top 190 proteins 
listed in Table 1. X corresponds to proteins not listed in CATH. 



C A T 


Strong All 


Root name 


2. 

2.10 

2.10.90 
2.10.70 

2.40 

2.40.10 

2.60 

2.60.40 


57.3% 26.4% 

17.3% 2.0% 

12.1% 0.3% 
5.2% 0.1% 

25.7% 8.9% 

21.5% 2.9% 

14.2% 10.6% 

3% 7% 


Mainly (3 

Ribbon 

Cystine Knot Cytokines, subunit B 
Complement Module, domain 1 
p Barrel 

Thrombin, subunit H 
Sandwich 
Immunoglobulin- like 


3. 

3.10 

3.10.20 
3.10.130 

3.40 

3.40.50 


26.8% 25.8% 

8.4% 5.2% 

2.6% 1.3% 
5.7% 1.0% 

17.9% 9.4% 

17.9% 5.6% 


a — (5 

Roll 

Ubiquitin-like (UB roll) 
P-30 Protein 

3-Layer (aba) Sandwich 

Rossmann fold 


X 


15.7% 26.6 % 
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TABLE 4: SCOP classes (C) and folds (F) contributing to the top strength proteins. X corresponds to 
proteins not listed in SCOP. 



C F 


Strong All 


Root name 


Description 


b. 

b.47 


40.5% 22.7% 

21.5% 2.7% 


P 

SMAD/FHA domain 


sandwich; 11 strands in 2 
sheets; greek-key 


C. 

c.69 


17.9% 9% 

15.7% 0.3% 


a 1(3 

Pyruvate kinase C- 
terminal domain-like 


Mainly parallel j3 — sheets 
(P — 01 — (3 units) 
3 layers: a/b/a; mixed j3 — 
sheet of 5 strands, order 
32145, strand 5 is antiparal- 
lel to the rest 


d. 

d.5 
d.113 


11.05% 18.9% 

5.8% 0.9% 
2.6% 0.2% 


a + /3 

RNase A- like 

DsrC, the 7 subunit of 
dissimilatory sulfite re- 
ductase 


Mainly antiparallel 

(3 — sheets (segregated 

a and ft regions) 

contains long curved f3 — 

sheet and 3 helices 

/3(3) — a(5); meander (3 — 

sheet packed against array of 

helices 


g- 

g-ir 

g.18 


13.7% 4.9% 

5.2% 0.1% 
6.3% 0.2% 


Small proteins 

Necrosis inducing pro- 
tein 1, NIP1 

Trefoil/Plexin domain- 
like 


Usually dominated by metal 
ligand, heme, and/or disul- 
fide bridges 

disulfide-rich fold; all — /3; 
duplication: contains two 
structural repeats 
disulfide-rich fold; common 
core is a + [3 with two con- 
served disulfides 


X 


16.3% 27.4% 







TABLE 5: Members of the cysteine-knot cytokines supcrfamilly. VEGF stands for vascular endothelial 
growth factor, BMP for bone morphogenctic protein, and TGF for transforming growth factor. The star 
* indicates uncomplexcd proteins. 
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family domain/complex 



PDB 



VEGF 



TGF 



VEGF-A 

VEGF-B 
VEGF-F 

BMP7/ActRII 
BMP2/IA 

BMP2 ternary ligand-receptor complex 
human arthemine/GFRbeta3 
human arthemine/GFRalpha3 



lvpf * ,2vpf * , lcz8, lbj 1 , lflt , lqty, lfpt , 

lmjv,lmkg,lmkk 

2c7w 

Iwq9,lwq8,lrv6,lfzv 

11x5, llxi, lm4u, lbmp 
lreu, lrew, 2es7, 3bmp* 
2h62, 2h64 
ltgj, ltgk 
2gh0, 2gyz 



BMP human growth and differentiation factor 5 lwaq , 2bhk 
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TABLE 6: The experimental and theoretical data on stretching of proteins. 



n 


PDB 


F^ ax [ P N] 


v p [nm/s] 




F£ ax [e/A] 


P[e/A] 


q[c/A] 


Note 


Ref. 


1 


ltit 


204 +/- 30 


600 


2.15 


1.85 


0.040 


2.335 


127*8 


[21,22] 


2 


lnct 


210 +/- 10 


500 


2.4 +/- 0.2 


1.48 


0.100 


2.703 


154-159 


[59. 60] 


3 


lglc 


127 +/- 10 


600 


2.3 +/- 0.2 


2.23 


0.038 


2.680 


15 titin 


[61] 


4 


lb6i 


64 +/- 30 


1000 


1.2 


0.74 


0.084 


1.710 


T4 lysozyme(21-141) 


[62] 


5 


laj3 


68 +/- 20 


3000 


1.23 


0.71 


0.107 


1.830 


spectrin R16 


[63] 


6 


ldqv 


60 +/- 15 


600 


1.5 


0.58 


0.147 


2.349 


calcium binding C2A 


[64] 


7 


lrsy 


60 +/- 15 


600 


1.7 +/- 0.2 


1.48 


0.040 


1.962 


calcium binding C2A 


[64] 


8 


lbyn 


60 +/- 15 


600 


1.4 


1.18 


0.066 


1.981 


calcium binding C2A 


[64] 


9 


lefc 


< 20 


600 


0.55 


0.37 


0.052 


0.997 


calmodulin 


[64] 


10 


lbni 


70 +/- 15 


300 


1.4, 1.7 


1.06 


0.044 


1.606 


barnase/i27 


[65] 


11 


lbnr 


70 +/- 15 


300 


1.05 


0.71 


0.053 


0.053 


barnase/i27 


[65] 


12 


lbny 


70 +/- 15 


300 


1.1, 1.3 


0.65 


0.046 


0.046 


barnase/i27 


[65] 


13 


lhz6 


152 +/- 10 


700 


3.5 


2.79 


0.064 


3.542 


protein L 


[66] 


14 


lhz5 


152 +/- 10 


700 


2.8 


2.22 


0.104 


0.104 


protein L 


[66] 


15 


2ptl 


152 +/- 10 


700 


2.2 +/- 0.2 


1.88 


0.045 


0.045 


protein L 


[66] 


16 


lubq 


230 +/- 34 


1000 


2.32 


1.47 


0.134 


3.019 


ubiquitin 


[57] 


17 


lubq 


85 +/- 20 


300 


0.9 


0.72 


0.083 


1.779 


ubiquitin(K48-C)*(2-7) 


[56, 57] 


18 


lemb 


350 +/- 30 


3600 


5.15 +/- 0.4 


4.16 


0.121 


5.403 


GFP(3-132) 


[67] 


19 


lemb 


407 +/- 45 


12000 


5.15 +/- 0.4 


4.30 


0.121 


5.403 


GFP(3-132) 


[68] 


20 


lemb 


346 +/- 46 


2000 


5.15 +/- 0.4 


4.09 


0.121 


5.403 


GFP(3-132) 


[68] 


21 


lemb 


117 +/- 19 


3600 


2.3, 4.3 


1.91 


0.050 


2.427 


GFP(3-212) 


[68] 


22 


lemb 


127 +/- 23 


3600 


2.2 +/- 0.2 


1.51 


0.164 


3.197 


GFP(132-212) 


[68] 


23 


lemb 


548 +/- 57 


3600 


3.5 +/- 0.1 


2.89 


0.142 


4.347 


GFP(117-182) 


[58] 


24 


lemb 


356 +/- 61 


3600 


3.2 +/- 0.2 


2.94 


0.075 


3.709 


GFP(182-212) 


[58] 


25 


lemb 


104 +/- 40 


3600 


2.3 +/- 0.2 


1.26 


0.236 


3.683 


GFP(N-C) 


[67] 


26 


lfnf 


75 +/- 20 


3000 


1.6, 1.8 


1.70 


0.130 


3.069 


Fniii-10 


[69. 70] 


27 


lttf 


75 +/- 20 


600 


0.7, 1.2 


0.99 


0.006 


1.071 


Fniii-10 


[71] 


28 


lttg 


75 +/- 20 


600 


0.7, 1.0 


0.17 


0.099 


1.365 


Fniii-10 


[71] 


29 


lfhh 


124 +/- 18 


600 


1.8 


1.10 


0.127 


2.635 


Fniii-12 


[70] 


30 


lfnh 


89 +/- 18 


600 


1.4, 1.7 


1.10 


0.127 


2.635 


Fniii-13 


[70] 


31 


loww 


220 +/- 31 


600 


2.1 +/- 0.2 


2.01 


0.024 


2.300 


FNiii-1 


[70] 


32 


lten 


135 +/- 40 


500 


1.7 


1.53 


0.026 


1.857 


TNFNiii-3 


[70, 72] 


33 


lpga 


190 +/- 20 


400 


2.4, +/- 0.2 


2.50 


0.001 


2.761 


protein G 


[73] 


34 


Igbl 


190 +/- 20 


400 


1.65 +/- 0.2 


1.69 


0.045 


2.237 


protein G 


[73] 


35 


laoh 


480 +/- 14 


400 


4.3 +/- 0.2 


3.69 


0.119 


0.119 


scaffoldin c7A 


[23] 


36 


lglk 


425 +/- 9 


400 


3.9 +/- 0.01 


3.22 


0.028 


4.106 


scaffoldin clC 


[23] 


37 


lanu 


214 +/- 8 


400 


3.3 +/- 0.03 


2.55 


0.060 


3.224 


scaffoldin c2A 


[23] 


38 


iqjo 


15 +/- 10 


600 


1.2 


1.25 


0.029 


1.601 


eE21ip3(N-C) 


[26] 



Table 6. F^ ax denotes the experimentally measured value of F max as reported in the reference stated 
in the last column. v p denotes the experimental pulling speed used. F* lax is the value of the maximal 
force obtained in our simulation within the L J3 model. They were performed at v p = 0.005A/t. F^ ax 
corresponds to the theoretical estimate of F max when extrapolated to the experimental speeds. The 
extrapolation assumes the approximate logarithmic dependence F max = p ln(y/v ) + q, where vq is 
1 A/t. 10 speeds were used to determine the values of p and q in analogy to the procedure illustrated 
in Figure [TT] The values of p and q are provided in columns 7 and 8 of the Table respectively. The first 
column indicates the corresponding symbol that is used in Figure 1121 
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Figure Legends 




Figure 1. Probability distribution of the maximal forces obtained in the set of 17 134 
model proteins (solid line). The shaded histogram corresponds to the 7510 proteins studied in 
ref. [19]. The insets show similar distributions for the CATH-based classes indicated. The numbers 
underneath the class symbols give the size of the set of the proteins considered. 




Figure 2. Similar to Figure [T] but for proteins belonging to specific ranges of the 
sequential sizes, as indicated by the symbols a, b, and c. 



2G 



CATH - architectures 



0.2 



0.1 



0.2 



Oh 



o.i - 




ribbon 2.10 

343 





0-barrel 2.40 
1540 




3-layer (aba) 
sandwich 3.40 
1625 




/S-sandwich 2.60 
1832 



a/0-roll 3.10 
901 




a-j3 complex 3.90 

it 421 




0.3 
0.2 

o.i h 



0-roll 2.30 
428 




orthogonal 
bundle 1.10 

2442 




1 



max 



[e/A] 



Figure 3. The top six panels show probability distributions of F rnax for the architectures 
that contribute to the pool of proteins with large forces. The architectures are indicated by 
their names and the accompanying CATH numerical symbol. The numbers underneath the symbols of 
the architecture inform about the number of cases contributing to the distribution. The bottom two 
panels show examples of architectures that are predicted to yield only small values of F max . 
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o.2 f, 

0.1 


0.2 

0.1 - 



CO 



Oh 



0.6 

0.4 

0.2 


0.2 



_h 



_g 



SCOP 




a+0 

3239 




3898 



J I L 




a/fi 
1556 




a 

2538 



J I I I L 



coiled-coil 

102 



small proteins 

848 




f membrane & surface 

127 




not in SCOP 

4707 



max 



max 



Figure 4. Distributions of F max for the SCOP-based classes for which there are more than 
60 structures that could be used in molecular dynamics studies. The cases that are not shown 
are: class e (27 structures), quasi-class i (5 structures), and quasi-class j (52 structures). The bottom 
right panel corresponds to structures which have no assigned SCOP-based structure label. The numbers 
indicate the corresponding numbers of structures studied. 
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SCOP 



0.2 

o.i 

0.2 
0.1 
a 



h h immunoglobulin- 
like /S-sandwich 

1024 




b.3 



prealbumin-like 

108 



J L 




J I L 



b.2 



common diphteria 
toxin/transcription 



41 



b.60 



GroES-like 

183 




J L 



0.4 



0.2 - 



b.47 



0.4 - d.15 



0.2 - 







SMAD/FHA domain 

460 




penicillin -binding 
protein 

227 




c.69 Pyruvate kinase 
C-terminal domain-like 

51 



g.18 



Necrosis inducin 
protein 1, NIP 

25 




I 



max 



[e/A] 



max 



[e/A] 



Figure 5. Distributions of F TOaa: for eight folds that may give rise to a large resistance to 
pulling. 
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max 
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Figure 6. Distribution of F max for eight folds that are likely to yield a small resistance to 
pulling. 
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p - strands (1c4p) cys loop (1cuz) 




Figure 7. Examples of force clamps found in the top strength proteins. The relevant 
disulphide bonds are shown in gray shade. The PDB codes of the examples of the proteins that show 
the particular type of a clamp are indicated. In the case of the CSK, the numbers indicate sequential 
locations of the amino acids participating in a disulphide bridge in the 13-ranked lvpf. 




Figure 8. Examples of the force patterns corresponding to proteins with the disulphide 
bonds. 
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Figure 9. Top: Two trajectories arising in protein lquO. Dragging occurs when the backbone is 
pulled across the cysteine loop. Shearing occurs when the pull across the cystcin loop does not take 
place. Bottom: The force-displacement pattern corresponding to the CSK force clamp in 2h64 (thick 
line). The thin line shows the corresponding pattern when one removes the attractive contacts that are 
slipknot related. 
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Figure 10. Geometry of a slipknot and a cystein slipknot. The top panel corresponds to a 
genuine slipknot. The bottom left panel is a schematic representation of the native geometry that yields 
the cystein slip-knot on stretching. The resulting cystein slipknot motif is shown in the bottom right 
panel. 
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Figure 11. Dependence of F rnax and the pulling velocity for the proteins indicated, vq 

corresponds to 1 A/r which is of order 10 8 nm/s. The data for several top strength proteins are shown. 
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Figure 12. Theoretical F^ ax extrapolated to the pulling speeds used experimentally vs. 
the corresponding experimental value, F£ lax . The solid line indicates the best slope of 1/(110 
pN). The dotted line corresponds to the previous result of 1/(71 pN) obtained in ref. [14] where no 
exptrapolation was made. The inset shows a similar plot in which the extrapolation is not implemented 
(denoted as F^ ax in Table 6). The list of the proteins used is provided by Table 6. It comprises almost 
all cases considered in ref. [14] but it also includes the recent data points obtained for the scaffoldin 
proteins [23] and the GFP [58] . The numerical symbols used in the Figure match the listing number in 
Table 6. 



SUPPLEMENTARY INFORMATION 

Tables 

TABLE IS: The predicted list of the strongest proteins, ctd. 



n 


PDBid 


N 


Fmax [e/A] 


Lmax [A] 


A 


CATH 


SCOP 


82 


2duk 


138 


3.8 


242.2 


0.43 






83 


lhqp 


149 


3.8 


197.7 


0.32 


2.40.128.20 


b.60.1.1 


84 


lcuu 


197 


3.8 


421.4 


0.55 


3.40.50.1820 


c.69.1.30 


85 


lafk 


124 


3.8 


175.4 


0.33 


3.10.130.10 


d.5.1.1 


86 


2o5w 


147 


3.8 


214.8 


0.33 






87 


lxzc 


197 


3.8 


422.9 


0.55 


3.40.50.1820 


c.69.1.30 


88 


lqoz 


206 


3.8 


238.7 


0.29 


3.40.50.1820 


c.69.1.30 


89 


3pcf 


200 


3.8 


251.2 


0.31 


2.60.130.10 


b.3.6.1 


90 


lodi 


234 


3.8 


458.8 


0.51 


3.40.50.1580 


c.56.2.1 


91 


ly2x 


142 


3.8 


38.1 


0.01 


2.60.270.20 


b.97.1.2 


92 


lmbq 


220 


3.8 


106.8 


0.09 


2.40.10.10 


b.47.1.2 


93 


lbj7 


150 


3.8 


195.1 


0.31 


2.40.128.20 


b.60.1.1 


94 


lodl 


234 


3.8 


458.8 


0.51 


3.40.50.1580 


c.56.2.1 


95 


11x5 


104 


3.8 


21.7 


0.01 


2.10.90.10 


g.7.1.3 


96 


lcuz 


196 


3.8 


416.5 


0.54 


3.40.50.1820 


c.69.1.30 


97 


3pch 


200 


3.8 


252.3 


0.31 


2.60.130.10 


b.3.6.1 


98 


loxm 


196 


3.8 


418.8 


0.55 


3.40.50.1820 


c.69.1.30 


99 


lh2p 


125 


3.8 


129.7 


0.14 


2.10.70.10 


g.18.1.1 


100 


lgwy 


175 


3.8 


143.2 


0.18 


2.60.270.20 


b.97.1.1 


101 


lcud 


197 


3.8 


420.1 


0.55 


3.40.50.1820 


c.69.1.30 


102 


lvvd 


118 


3.8 


112.1 


0.14 


2.10.70.10 


g.18.1.1 


103 


lhfh 


120 


3.8 


107.3 


0.15 


2.10.70.10 


g.18.1.1 


104 


lvvc 


118 


3.8 


112.9 


0.14 


2.10.70.10 


g.18.1.1 


105 


lcuv 


197 


3.8 


420.3 


0.55 


3.40.50.1820 


c.69.1.30 


106 


lc77 


130 


3.8 


109.5 


0.18 


3.10.20.130 


d. 15. 5.1 


107 


lxuk 


223 


3.8 


115.0 


0.10 


2.40.10.10 


b.47.1.2 


108 


lc2k 


223 


3.8 


114.4 


0.10 


2.40.10.10 


b.47.1.2 


109 


2stb 


222 


3.8 


113.1 


0.09 


2.40.10.10 


b.47.1.2 


110 


3tgi 


223 


3.8 


110.5 


0.10 


2.40.10.10 


b.47.1.2 


111 


3byr 


88 


3.8 


196.8 


0.57 






112 


laOj 


223 


3.8 


111.6 


0.10 


2.40.10.10 


b.47.1.2 


113 


2pcd 


200 


3.7 


251.3 


0.31 


2.60.130.10 


b.3.6.1 


114 


lvve 


118 


3.7 


104.3 


0.11 


2.10.70.10 


g.18.1.1 


115 


2pf6 


231 


3.7 


488.0 


0.51 






116 


3pcl 


200 


3.7 


252.5 


0.31 


2.60.130.10 


b.3.6.1 


117 


lafl 


124 


3.7 


175.3 


0.33 


3.10.130.10 


d.5.1.1 



37 



118 


lbs9 


207 


3.7 


241.4 





29 


3 


40 


50.1820 


c.69.1.30 


119 


ltpa 


223 


3.7 


113.4 





10 


2 


40 


10.10 


b.47.1.2 


120 


3rn3 


124 


3.7 


168.5 





31 


3 


10 


130.10 


d.5.1.1 


121 


2grk 


228 


3.7 


135.6 





12 


2 


60 


240.10 




122 


lxzh 


197 


3.7 


421.1 





55 


3 


40 


50.1820 


c.69.1.30 


123 


lxui 


223 


3.7 


113.7 





10 


2 


40 


10.10 


b.47.1.2 


124 


lrpg 


124 


3.7 


204.4 





40 


3 


10 


130.10 


d.5.1.1 


125 


lxuj 


223 


3.7 


115.3 





10 


2 


40 


10.10 


b.47.1.2 


126 


lbra 


223 


3.7 


112.1 





10 


2 


40 


10.10 


b.47.1.2 


127 


lrtb 


124 


3.7 


202.9 





39 


3 


10 


130.10 


d.5.1.1 


128 


lclo 


223 


3.7 


113.8 





10 


2 


40 


10.10 


b.47.1.2 


129 


lgkg 


136 


3.7 


142.9 





21 


2 


10 


70.10 


g.18.1.1 


130 


lc5v 


223 


3.7 


115.0 





10 


2 


40 


10.10 


b.47.1.2 


131 


ltnk 


223 


3.7 


113.5 





10 


2 


40 


10.10 


b.47.1.2 


132 


ltzh 


94 


3.7 


67.4 





10 


2 


10 


90.10 


b.1.1.1 


133 


lckl 


126 


3.7 


127.3 





16 


2 


10 


70.10 


g.18.1.1 


134 


2fwu 


157 


3.7 


88.2 





10 








b.1.27.1 


135 


laqp 


124 


3.7 


204.7 





40 


3 


10 


130.10 


d.5.1.1 


136 


2g4x 


124 


3.7 


205.1 





40 


3 


10 


130.10 


d.5.1.1 


137 


2sta 


222 


3.7 


114.3 





10 


2 


40 


10.10 


b.47.1.2 


138 


lh03 


125 


3.7 


128.9 





14 


2 


10 


70.10 


g.18.1.1 


139 


lmtv 


223 


3.7 


112.2 





10 


2 


40 


10.10 


b.47.1.2 


140 


lco7 


223 


3.7 


110.7 





10 


2 


40 


10.10 


b.47.1.2 


141 


2olc 


147 


3.7 


215.3 
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142 


lane 


223 


3.7 


110.7 





10 


2 


40 


10.10 


b.47.1.2 


143 


lutl 


222 


3.7 


113.1 





10 


2 


40 


10.10 


b.47.1.2 


144 


lbtp 


223 


3.7 


113.2 





10 


2 


40 


10.10 


b.47.1.2 


145 


lxuh 


223 


3.7 


115.4 





10 


2 


40 


10.10 


b.47.1.2 


146 


lo72 


175 


3.7 


142.9 





18 


2 


60 


270.20 


b.97.1.1 


147 


2ofc 


141 


3.7 


37.7 





01 


2 


60 


270.20 




148 


6rsa 


124 


3.7 


203.7 





39 


3 


10 


130.10 


d.5.1.1 


149 


2ofe 


141 


3.7 


37.8 





01 


2 


60 


270.20 




150 


2ofd 


141 


3.7 


37.9 





01 


2 


60 


270.20 




lol 


2aso 


one 
206 


3.7 


4/0.0 




c o 
00 










152 


iy3y 


223 


3.7 


111.9 





10 


2 


40 


10.10 


b.47.1.2 


153 


lxiO 


143 


3.7 


41.1 





01 


2 


60 


270.20 




154 


lh9i 


223 


3.7 


110.4 





09 


2 


40 


10.10 


b.47.1.2 


155 


2dsc 


195 


3.7 


429.0 





56 










156 


lw4o 


124 


3.7 


203.3 





39 


3 


10 


130.10 


d.5.1.1 


157 


llqe 


223 


3.7 


114.8 





10 


2 


40 


10.10 


b.47.1.2 


158 


ltgn 


222 


3.7 


112.6 





09 


2 


40 


10.10 


b.47.1.2 


159 


ltnl 


223 


3.7 


114.3 





10 


2 


40 


10.10 


b.47.1.2 


160 


lotx 


236 


3.6 


463.5 





50 


3 


40 


50.1580 


c. 56. 2.1 



38 



161 


lffe 


197 


3.6 


420.9 





55 


3 


40 


50.1820 


c.69.1.30 


162 


lbju 


223 


3.6 


113.8 





10 


2 


40 


10.10 


b.47.1.2 


163 


lanb 


223 


3.6 


111.5 





10 


2 


40 


10.10 


b.47.1.2 


164 


lssa 


113 


3.6 


163.2 





36 


3 


10 


130.10 


d.5.1.1 


165 


lc9p 


222 


3.6 


114.1 





10 


2 


40 


10.10 


b.47.1.2 


166 


ltx6 


223 


3.6 


111.7 





10 


2 


40 


10.10 


b.47.1.2 


167 


2fws 


139 


3.6 


91.9 





12 










b. 1.27.1 


168 


ljl6 


223 


3.6 


111.5 





10 


2 


40 


10.10 


b.47.1.2 


169 


2g4w 


124 


3.6 


204.0 





40 


3 


10 


130.10 


d.5.1.1 


170 


3pca 


200 


3.6 


251.3 





31 


2 


60 


130.10 


b.3.6.1 


171 


3pce 


200 


3.6 


251.8 





31 


2 


60 


130.10 


b.3.6.1 


172 


lfy8 


215 


3.6 


112.8 





09 


2 


40 


10.10 


b.47.1.2 


173 


3pci 


200 


3.6 


251.5 





31 


2 


60 


130.10 


b.3.6.1 


174 


lvc8 


126 


3.6 


199.1 





37 










d. 113. 1.1 


175 


2a2g 


158 


3.6 


208.4 





32 


2 


40 


128.20 


b.60.1.1 


176 


2p78 


171 


3.6 


168.7 





23 


3 


40 


50 


1240 




177 


lc78 


130 


3.6 


109.8 





18 


3 


10 


20 


130 


d. 15. 5.1 


178 


lxzg 


197 


3.6 


421.8 





55 


3 


40 


50 


1820 


c.69.1.30 


179 


2boc 


219 


3.6 


228.3 





23 


2 


60 


40 


10 


f.14.1.1 


180 


lcuy 


197 


3.6 


420.3 





55 


3 


40 


50 


1820 


c.69.1.30 


181 


2d3j 


lo ( 


3.o 


99.8 


u 


1 












182 


2pqx 


245 


3.6 


147.4 





12 












183 


lql9 


223 


3.6 


110.7 





10 


2 


40 


10 


10 


b.47.1.2 


184 


lntp 


223 


3.6 


114.0 





10 


2 


40 


10 


10 


b.47.1.2 


185 


lfmg 


223 


3.6 


115.1 





10 


2 


40 


10 


10 


b.47.1.2 


186 


lsxt 


224 


3.6 


415.9 





48 


2 


40 


50 


110 


b.40.2.2 


187 


lc2d 


223 


3.6 


133.6 





12 


2 


40 


10 


10 


b.47.1.2 


188 


lppe 


223 


3.6 


113.9 





10 


2 


40 


10 


10 


b.47.1.2 


189 


lane 


223 


3.6 


113.3 





10 


2 


40 


10 


10 


b.47.1.2 


190 


lxzb 


197 


3.6 


421.2 





55 


3 


40 


50 


1820 


c.69.1.30 



Table IS. Continuation of Table 



1 of the main text. 
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TABLE 2S: Identification of a mechanical clamp F max for selected proteins. F rnax denotes the me- 
chanical resistance obtained when all native contacts are present. F' max is the force obtained when some 
of some sets of the relevant native contacts is removed. 



rank 


PDB 


Fmax [e/A] 


F 


■nax [ e / A] 


FLax [e/A] 


1 


lvpf 


5.31 


4.72 - 


slipknot loop 


1.96 


- polymer 


7 


2h64 


4.62 


4.65 - 


slipknot loop 


2.84 


- polymer 


19 


2c7w 


4.23 


4.25 - 


slipknot loop 


2.15 


- polymer 
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Figure SI. Origin of the bimodality in the force distribution for the SMAD/FHA b.47 

fold, (a) Structure of trypsin lbra (N = 245). The mechanically crucial disulphidc bond between sites 
128 and 232 is highlighted in red. (b) Structure of elastase lelc (N = 255) which belongs to the same 
fold b47.1.2 as lbra. This structure does not contain two disulphide bonds that lbra does, (c) The 
force-displacement plot for lbra. F rnax corresponds to 3.7 e/A. The thinner line is obtained when the 
128-232 disulphide bond is eliminated - F max drops to 2.7 e/A. When one more disulphide bond is cut, 
stretching continues to distances shown in panel (d) without affecting F max . (d) The force-displacement 
plot for lelc. The corresponding F max is 2.0 e/A. In the case of lelc, stretching results in the terminal 
helix pulling (3 strands from the inside of the protein and thus causing the inner /3-barrcl to unfold. If 
the case of lbra (with the disulphidc bridge), the terminal helix pulls the neighbouring loop. After this 
event, resistance grows linearly and forms one major force peak. After the peak, the whole structure 
opens suddenly, rupturing contacts between strands in the /3-barrcl and in the neighbouring loops. 



